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1.   INTRODUCTION 

Few  efficient  iterative  methods  have  been  developed  for  treating 
large  nonsymmetric  linear  systems.   Some  methods  amount  to  solving  the 
normal  equations  A  Ax  =  Ad  associated  with  the  system  Ax  =  b  or  with  some 
other  system  derived  by  a  preconditioning  technique. 

This,  unfortunately,  is  sensitive  to  the  conditioning  of  A  A 
which  is  in  general  much  worse  than  that  of  A.   Techniques  using 
Tchebycheff  iteration  [12]  do  not  suffer  from  this  drawback  but  require 
the  computation  of  some  eigenvalues  of  A. 

A  powerful  method  for  solving  symmetric  linear  systems  is 
provided  by  the  conjugate  gradient  algorithm.   This  method  achieves  a 
projection  process  onto  the  Krylov  subspace  K  =  Span  (r  ,  Ar  , .  . .  ,  A   r  ) 
where  rn  is  the  initial  residual  vector.   Although  the  process  should 
theoretically  produce  the  exact  solution  in  at  most  N  steps,  it  is  well 
known  that  a  satisfactory  accuracy  is  often  achieved  for  values  of  m  for 
less  than  N  [15].   Concus  and  Golub  [5]  have  proposed  a  generalization  of 
the  conjugate  gradient  method  which  is  based  upon  the  splitting  of  A  into 
its  symmetric  and  skew- symmetric  parts. 

The  purpose  of  the  present  paper  is  to  generalize  the  conjugate  gradient 
method  regarded  as  a  projection  process  onto  the  Krylov  subspace  K  .   We 
shall  say  of  a  method  realizing  such  a  process  that  it  belongs  to  the 
class  of  Krylov  subspace  methods.   It  will  be  seen  that  these  methods 
can  be  efficient  for  solving  large  nonsymmetric  systems. 

The  next  section  describes  the  Krylov  subspace  methods  from 
a  theoretical  point  of  view.   In  Section  3  some  algorithms  are  proposed. 
They  are  essentially  the  extensions  of  the  Arnoldi-like  methods  for  solving 
large  eigenvalue  problems  described  in  [18].   Section  4  deals  with  the 
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convergence  of  the  Krylov  subspace  methods.   Finally,  some  numerical 
experiments  are  described  in  Section  5. 

2.   THE  KRYLOV  SUBSPACE  METHODS  —  THEORETICAL  ASPECTS 
2.1.   General  projection  process  —  notations 
Consider  the  linear  system 

Ax  -  b  = .0  (2.1) 

where  A  is  a  (complex  or  real)  N  x  N  matrix  and  let  V  =  [v, , . . . ,  v  ] 

ml       m 

■  N 
be  a  system  of  m  linearly  independent  vectors  in  (p  .   The  projection 

process  onto  the  subspace  K  =  Span  (v.  , . . . ,  v  )  seeks  an  approximation 

m  1       m 

x    to  the  solution  of  (2.1)   by  requiring  that 

x(m)  e  k 

m  (2.2) 

Ax1   -  b  _L  v . ,    j  =  1 ,  2 , . . . ,  m 

r,    .     .     (m)    „     (m)   ...    ,  .     ,     (m)         .  _   . 
Writing  x    =  V   •  y    ,  it  is  immediate  that  y    must  satisfy  the  m  x  m 
m 

linear  system 

VH  AV   •  y(m)  -  VH  b  =  0  (2.3) 

mm  m 

H  H   — T 

where  V  denotes  the  transpose  of  the  conjugate  of  V  :  V  =  V  .   Let 

m  m   m    m 

7T  denote  the  orthogonal  projector  onto  the  subspace  K  .   Then  another 
m  m 

formulation  of  (2.2)  is  the  following 

(m) 


(2.4) 


e  k 

m 

Ax(m)  -  b)  =  0 
m 

It  will  be  assumed  for  simplicity  that  b  £  K  .   We  shall  denote  by  A  the 

m  m 

restriction  of  it  A  to  K  ,   so  that  x    is  the  solution  in  K  of  the  equation 
m      m  —  m 

A  x  -  b  =  0  (2.5) 

m 

(Note  that  b  G  K  so  that  tt  b  =  b.) 

m  m 
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The  problem  (2.1)  is  therefore  replaced  by  the  m-dimensional 
problem  (2.5).   In  order  to  study  the  convergence  properties  of  this 
process,  one  may  express  the  error  in  terms  of  the  distance  between  the 

exact  solution  x*  and  the  subspace  K  ,  that  is  in  terms  of  II  (I  -  it  )x*   . 

m  "      m    " 

See  [8]. 

Note  here  that  when  A  is  Hermitian  definite  positive,  the 

convergence  is  more  easily  studied  by  using  the  fact  that  the  approximate 

solution  x    minimizes  the  error  function  E(x)  =  (x  -  x*)   A(x  -  x*)  over 

all  elements  x  in  K  .   Unfortunately,  this  property  does  not  extend  to  the 

m 

nonsymmetric  case  so  it  becomes  necessary  to  make  a  different  approach. 

Suppose  that  the  exact  solution  x*  is  close  to  K  ,  in  that  tt  x*  is  close 

m  m 

to  x*,  then  it  is  possible  to  show  that  x    is  close  to  ttx*  (hence  to  x*) 

m 

by  showing  that  the  residual  of  tt  x*  for  the  problem  (2.5)  is  small. 

m 

More  precisely, 


Proposition  2.1 

Let  Y  =  1 1  TT  A(I  -  IT  )  II.   Then  the  residual  of  tt  x*  for  problem 
m   "  m       m  "  m 


(2.5)    satisfies 


b  -  A  it  x*     <  Y      (I  -  tt   )x*  (2.6) 

1  m  m      "  —     m"  m        " 


Proof 


b    -    A    TT    X*    =    b    -    TT    ATT    X* 

mm  mm 


=  b   -   TT  A[x*   -    (I   -   7T   )x*] 
m  m 


=  TT  A(I   -   tt   )x* 
m  m 
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Observing  that  (I  -  it  )  is  a  projector  we  can  write 

m 

lib  -  A  IT  X*  II  =  1 1  TT  A(I  -  TT  )(I  -  7T  )  X* 

m  m      "  m       m       m 

<  Y    ||(I  ~  TT  )X*|| 

—  m  "      m    " 


which  completes  the  proof.   D 


As  a  consequence,  we  can  state  the  next  corollary  which  gives  a  bound  for 

Ik*  -  v(m)IL 


Corollary  2.1. 

Let  Y  he  defined  as  above  and  let  K  be  the  norm  of  the  inverse 
m  m 

of  A  .   Then  the  error  x*  -  x    satisfies 
m 


*    -    X(m)||    <     A    +    Y2K2    ||  (I    -    TT    )X*||  (2.7) 


mm  m 


Proof 


By  Proposition  (2.1)  and  the  fact  that  x    -  tt  x*  = 

m 


A      (b  -  A  it  x*) ,   we   get 
m  mm 


1 1 TT    (x*   -   x(m))||  <  Y   <     II  CX   —   tt   )x*||  (2.8) 

"    m  "—mm"  m         " 

/     i   ,       (m)     (m)  \    „  .  . 
(remark  that  tt  x    =  x   ).   Writing 

m 

x*  -  x(m)  =  (T  _  tt  )x*  +  TT  (x*  "  X(m))  (2.9) 

m       m 
and  observing  that  the  two  vectors  on  the  right  hand  side  of  (2.9)  are 
orthogonal,  we  obtain 

||x*    -     x(m)||2    =    ||(I    -    TT     )X*||2    +||7T     (x*    -    X(m))||2 
II  ii  m  ii  ii     m  ii 

which,  in  view  of  (2.8),  gives  the  desired  result  (2.7).  □ 


-5- 

The  above  results  show  that  the  error  ||x    -  x*||  will  be  of 
the  same  order  as  ||  (I  -  it  )x*||  provided  that  the  approximate  problem  (2.4) 
is  not  badly  conditioned. 


2.2.  Krylov  subspace  methods 

Let  x_  be  an  initial  guess  at  the  solution  x*  of  (2.1)  and 
let  r.  be  the  initial  residual  r_  =  b  -  Ax_.   If  the  unknown  x  is 
decomposed  as  x  =  x.  +  z  then  clearly  the  new  unknown  z  must  satisfy 

Az  -  r  =  0  (2.10) 

By  a  Krylov  subspace  method  we  shall  refer  to  any  method 
that  obtains  an  approximation  z    to  problem  (2.10)  by  applying  a 

projection  process  to  the  system  (2.10)  onto  the  Krylov  subspace 

__. -I 

Km  =  Span  [rQ,  Ar  , . . . ,  A   rQ]. 

We  shall  assume  throughout  that  the  vectors  r  ,  Ar  , . . . ,  A   r 

are  linearly  independent,  which  means  that 

dim(K  )  =  m  (2.11) 

m 

If  V  =  [v, , .  .  .  ,  v   is  any  basis  of  K  ,  then  according  to 
m     1       m  m 

o  -.    (tn)     ,  ,      (m)    _,      (m)   ,      (m)  .    , 

Section  2.1,  z    can  be  expressed  as  z    =  V   •  y    where  y    is  the 

m 

solution  of  the  m  x  m  system 

VHAV   •  y(m)  -  VHr  =  0  (2.12) 

mm  m  0 

and  the  approximate  x    of  problem  (2.1)  is  related  to  z    by 

On)        „ (m) 
x    =  xn  +  z 

If  z*  =  A  r_  denotes  the  exact  solution  of  the  system  (2.10), 

then  we  notice  that 

On)     .     (m) 


x 


*  -  xvl,/  =  z*  -  zvm/  (2.13) 


which  means  that  x    and  z    admit  the  same  error  vector  for  (2.1) 
and  (2.10),  respectively. 
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3.   PRACTICAL  METHODS 

Some  algorithms  based  upon  the  Krylov  subspace  methods 

described  above  will  now  be  presented.   We  first  propose  an  adaptation 

of  Arnoldi's  method  [1],  [18]  to  the  solution  of  systems  of  linear  equations. 

The  algorithm  constructs  an  orthonormal  basis  V  =  [v, , . . . ,  v  I  of  K  such 

m     1       m      m 

T 
that  V  AV  has  Hessenberg  form.   An  iterative  version  of  this  method  is 
mm 

also  given  so  as  to  avoid  the  storage  of  too  large  arrays  in  memory.   Then 
another  class  of  algorithms  is  derived  from  the  incomplete  orthogonalization 
method  described  in  [18]. 

3.1.   The  method  of  Arnoldi 

Arnoldi's  algorithm  builds  an  orthonormal  basis  v, , . . . ,  v  of 

1  m 

K     =   Span    [r_,   Ar_,...,   A       r_]   by  the  recurrence 
m  U  U  U 


\+l,k  \+l   =   A\  ~   if1   hik  Vi  (3'1) 

starting  with  v     =  r   /||r    ||  and   choosing  h      ,    i  =   1,...,    k+1   in  such  a 

X.  U     U  XK. 

way  that  v,  , ,  I  v, v,  and  II  v,  , .  II  =  1.   In  exact  arithmetic  the 

k+1    1       k     "  k+1 ' ' 

algorithm  would  be  as  follows. 


Algorithm  3.1 

1.  Compute  r   =  b  -  Ax  and  take  v   :=  r  /||r  ||. 

2.  For  k  :=  1  until  m  do 


w  :=  Av,  -   Z  h..  v.  with  h..  :=  (Av,  ,  v.)  (3.2) 

k   .  ,   Ik  I       lk       k   i 

hk+lik  :-|HI  (3-3) 

k+1       k+l,k 
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See  [18]  for  some  remarks  on  the  practical  realization  of  this  algorithm. 

It  is  easily  seen  that  [v,,  v0  , .  .  .  ,  v  ]  is  an  orthonormal  basis  of  K 
J  12m  in 

H 
and  that  the  matrix  V  AV  is  the  Hessenberg  matrix  H  whose  nonzero 

mm  m 

elements  are   the  h. .    defined  by    (3.2)    and    (3.3).      As   a  consequence 

the  vector  V   rn   in    (2.7)    is   equal   to   3  *  v  v     =   gen    where   3  =  ||rn||. 
m  0  mil  u 

Thus   the   system   (2.7)   becomes 

H     •   y(m)   =   3   •    e.  (3.4) 

m  1 

and  the  approximate  solution  x    defined  in  Section  2.2  reads 

(m)      ,   (m)   . 
x    =  xn  +  z    where 

z(m)  =  3V  H_1en  (3.5) 

mm  1 

The  following  estimate  for  the  residual  norm  ||b  -  Ax    |  is  very 

useful  as  a  stopping  criterion 

||b  _  Ax(m)||=  h..    |eHy(m)|  (3.6) 

"         "    m+l,m  '  m     ' 

Equality  (3.6)  follows  immediately  from  the  relation 

H 
AV  =  V  H  +  h  . ,    v.-  e 
m    mm    m+1 ,  m  m+1  m 

which  can  be  derived  from  the  algorithm  and  from  equality  (2.8). 

An  interesting  practical  method  would  be  to  generate  the 

vectors  v  and  the  matrix  IL  ,  k  =  1,  2,...,  m,...,  to  compute 

periodically  the  estimate  h  , .,     e  y     of  the  norm  of  the  residual 
r  J  m+l,m  '  m     ' 

and  to  stop  as  soon  as  this  is  small  enough.   As  was  suggested  in  [15] 

for  the  symmetric  case,  there  are  various  ways  of  updating  |e  y 

without  even  actually  computing  the  vector  y    .   Let  us  give  a  few 

indications  about  the  problem  of  computing  the  estimation  e  y     , 

m 

since  it  will  appear  in  several  parts  along  the  paper.   Parlett  [15] 

suggests  utilizing  a  recurrence  relation  proposed  by  Paige  and 

Saunders  [14],  which  is  based  upon  the  LQ  factorization  of  H  . 

m 
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Another  interesting  possibility  is  to  perform  the  more 

economical  factorization  provided  by  the  Gaussian  elimination  with 

partial  pivoting  on  the  matrix  H..   The  factorization  of  H.  can  be 

easily  performed  by  using  the  information  at  the  previous  step. 

Supposing  that  no  pivoting  has  been  necessary  for  steps  1  through  j-1, 

and  writing  the  LU  factorization  of  H.,  H.  =  LU,  it  can  be  easily 

J    3 

^        i      i  H   (m)i  . 
seen  that  p.  =  n.,_  .  e  y    is  simply 

j    3+1  >1   '  m  J        ■ 

J-1 

p.  =  h...   .  B  |(n  £,)/u..| 

3  J+1,J    ±=1  i   33 

where  the  £.,  i  =  1,...,  j-1,  are  the  successive  pivots.   More 
generally  it  can  be  shown  that  when  no  pivoting  has  been  necessary 
at  steps  i,  i£  I,  where  IC  {l,  2,...,  j-l},  then  p.  becomes 

This  means  that  p.  can  be  updated  at  each  step  at  a  negligible  cost. 

Finally,  after  it  is  decided  that  the  estimate  of  the  residual  norm 

is  small  enough,  the  final  factorization  of  H  will  be  used  to  fully 

m 

solve  the  system  (3.4).   The  Gaussian  elimination  with  partial 
pivoting  gives  satisfactory  results  in  general,  but  one  might  as  well 
use  a  more  stable  decomposition,  as  the  LQ  decomposition  in  [14],  [15] 
although  at  a  high  cost. 

As  m  increases,  the  process  of  computing  the  v.  becomes, 
unfortunately,  intolerably  expensive  and  core  memory  demanding.   To 
remedy  this,  one  can  use  the  algorithm  in  an  iterative  way,  as  is 
described  next. 


3.2.   Iterative  Arnoldl  method 

Due  to  core  memory  capacities,  the  number  m  of  steps  in 

Algorithm  3.1  is  inevitably  limited.   After  having  computed  the  approximate 

solution  x    with  the  maximum  number  of  steps  allowed,  one  may  find  that 

the  accuracy  is  still  unsatisfactory.   This  naturally  raises  the  question 

of  how  to  improve  the  accuracy  of  x   .   The  simplest  idea  is  to  restart 

the  algorithm  with  x_  replaced  by  the  approximation  x    obtained.   The 

idea  is  similar  to  that  of  the  m  step  steepest  descent  in  the  symmetric 

case.   (See  [6].)   One  can  restart  as  many  times  as  necessary  to  ensure 

satisfactory  accuracy.   We  now  give  a  more  detailed  description  of  this 

iterative  version.   Let  us  start  with  an  initial  guess  x_  and  form 

r_  =  b  -  Ax_.   Then  construct  H  and  V  by  algorithm  (3.1)  and  compute 
0         0  mm 

the  approximate  solution  x    =  x_  +  z    .   The  estimation  (3.6)  can  be 

used  to  determine  whether  the  process  must  be  stopped  or  restarted. 

Suppose  a  restart  is  necessary.   Then  take  x  =  x  +  z    and  compute 

r-  =  b  -  Ax  .   (Remark  that  r   is  also  equal  to  the  residual  r_  -  Az    .) 

Construct  again  V  and  H  starting  with  vn  =  r_/|  r,||  in  Algorithm  3.1. 
°     m      m        °       1    1  "  1" 

Then  an  approximate  solution  z    to  the  equation  Az  =  r   is  obtained 

yielding  the  new  approximation  x  =  x  +  z    to  the  solution  x*  and 

so  forth. 

At  the  s-th  iteration  the  approximate  solution  x  is  equal  to 

s 

x  +  z    +. . .+  z    .   Thus  the  algorithm  can  be  formulated  as  follows. 
(The  subscript  (m)  is  dropped  for  simplifications.) 
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Algorithm  3.2 

1.  Start.   Choose  m  and  x  ;  r   :=  b  -  Ax  . 

2.  For  s  :=  0,  1,. . .  ,  do 

•  compute  v1 ,  v_  , .  .  .  ,  v  and  H  by  Algorithm  3.1  starting  with 

1    z       m      m 

•  Solve  the  system  H   •  y  =  3  *  en 

m  1 


's+1 


•  x 


s+1 


•  r 


s+1 
If  h 


=  V   •  y 
m   J 


=  x  +  z 
s    s+1 

=  r  -  Az   _ 
s     s+1 


e  y  <  e  ,  stop  else  continue 
m+l,m  '  m   ' 


3.3.   Incomplete  orthogonalization  methods 

3.3.1.     The  construction  of  the  vectors  v, . . . . ,  v  by  Algorithm  3.1 

1       m 

amounts  to  orthogonalizing  the  vectors  Av,  against  all  previous  vectors 


V 


,  v  .   This  is  costly  and  some  numerical  observations  suggest  to 


orthogonal ize  Av  against  the  preceding  p+1  vectors  rather  than  all 
(see  [18]). 


The  system  produced  is  such  that  (v.,  v.)  =  6..  for  i,i 
satisfying  |  i- j  |  <^  p. 

Algorithm  3.3 

1 .  Choose  p  and  m  such  that  p  <  m;  compute  r 

2.  Forj  :=  1,  2  , . . . ,  m  do 


0 


:=  b  -  Ax  and 


i   :=  max(l,  j-p+1) 


1 


w  :=  Av .  -      hj  .    v .  with 
0 


i  =  i.   lj  L 


hi-  (Avjf  V±) 


V:=w/(Vi(J  :="wl 


(3.7) 
(3.8) 
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Under  the  assumption  (2.11),  this  algorithm  will  not  stop  before  the  m-th 

step  and  will  produce  a  system  of  vectors  v, , . . . ,  v  locally  orthogonal 

1       m 

and  a  (banded)  Hessenberg  matrix  of  the  form 


H  = 
m 


whose  nonzero  elements  are  computed  from  (3.7)  and  (3.8).   The  generalized 

(m)         .  r        , 
Lanczos  approximatxon  z    must  satisfy  the  equations 

VHAV  y(m)  -  VH  rn  =  0  (3.9) 

mm        m  U 


z(m)  =  V    y(m) 
m 


H 


In  the  present  case,  however,  the  matrix  V  AV  does  not  have 

m  m 

any  particular  structure  as  before,  so  we  need  to  transform  (3.9)  into 

a  simpler  problem. 

a      H    -1   H 
Let  us  set  H  =  (V  V  )  "V  AV  .   Note  that  this  is  just  the 
m     mm     mm 

matricial  representation  of  the  linear  operator  A  =  tt  AiT,   (see  Section 

m    m  K 
'  m 

2.1),  in  the  basis  {v.,  v_,...,  v  }.   It  was  shown  in  [18]  that  H  differs 

1    2.  m  m  

from  H  only  in  its  last  column.   More  precisely 
m 


Theorem  3.1 


(A  r1 


Let  s   =  h  . _   (V"V  )  ~  V"  v  J _ .   Then 
m    m+l,m  mm     m  m+1 


~  H 

H  =  H  +  s  e 
m    m    mm 

Proof.   From  Algorithm  3.3  we  get  the  basic  equation 

AV     =  V     H     +  h   ,,        v    ,,    eH 
m         mm         m+l,m     m+1     m 


(3.10) 


vfalch  yields    (3.10)   on  multiplying  by    (A  )    1va.  D 

mm    m 

Multiplying  (3.9)  by  (V  v  )    gives  the  equivalent  equat 


m  m 


ion 


H  y(m)  -  (A  )"1  VH  r  =  0 

m  0 


m 


m  m 
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Observing  that  (V  v  )   V  r.  =  3c.  where  3  =  ||r_||,  we  obtain  the  system 
m  m     m   U      1  U 

H  y(m)  -  3e.  -  0  (3.11) 

m  1 

If  we  set  y    =  3H  e.  and  y    =  3H  e  ,  then  by  the  Shermann 

ml  ml 

and  Morrison   formula    [7]    these   two  vectors   are   related  by 

y     -  y     -   OH-1  s  (3.12) 

m  m  m        m 

H~(m)  .,.     ,      H  ~-l        N 
where  a  =  ey/(l  +  e     H       s). 
m  m     m       m 

On  the  practical  side  the  only  difficulty  lies  in  the  computation 

of  the  corrective  column  s  .   Note  that  s   =  h  .,    V  v  ,,  and  that  s 

m  m    m+1 ,  m  m  m+1  m 

is  the  solution  of  the  least  square  problem  (see  [19]) 

min  || V  S  -  h  .,    v    ||  (3.13) 

s   "  m     m+l,m  m+1" 

for  which  many  efficient  algorithms  are  available  (see  [3],  [13]).   It 

should  be  added  that  only  a  moderate  accuracy  is  needed  in  practice, 

so   the  bidiagonalization  algorithm  BIDIAG  described  in  [13]  is 

suitable  for  solving  (3.13)  with  moderate  accuracy.   We  can  now  give  an 

algorithm  based  upon  all  the  above  observations. 


Algorithm  3.4.   Incomplete  Orthogonalization  with  Correction 

Start.   Choose  two  integers  p  and  m  with  p  <  m.   Compute  r   :=  b  -  Ax  , 

8-||r0||;  vx  ,-yg. 

Iterate.      Comment    compute   H     and   v. , . . . ,    v    . 

ml  m 

For  j    =   1,    2,...,    mdo 
iQ    :=  max(l,    j-p) 

j 

w    :=   Av.    -      E         (h..     :=    (Av . ,    v.))    x  v. 
J         -|_-i  1J  J         i  ^ 

0 

-j+1    »■  w/(hj+i,-j    :=  HWID 
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Correct: 

1.  Compute  least   square   solution    s     of    (3.13). 

2.  Compute     y      :=   $H       e 
c m  m       n 

x    :=  H_1s 

m       m 

Q    :=   e     y   /(l  +  e     x) 
mm  m 

y      :=  y     -  ax 
m  m 

3.  Form  the  approximate   solution 

(m)  .    tT  ^ 

x  =   x_   +  V      •    y 

Omm 

We  shall  now  give  some  additional  practical  details. 

1.  If  necessary,  the  vectors  v  ,  v_  , .  .  .  ,  v   may  be  stored  in  auxiliary 

memory,  one  by  one  as  soon  as  they  are  computed.   Only  the  p 

vectors  v.,  v.  ,,...,  v.   ,-  must  be  kept  in  main  memory  for  more 
J    3-1       J-P+1 

efficiency. 

2.  The  storage  of  H  now  requires  only  the  storage  of  (p+1)  x  m  elements 

m 

instead  of  the  previous  m  . 

3.  For  the  choice  of  the  integer  p  we  should  first  point  out  that  p  is 

limited  by  the  available  core  memory.   In  theory  the  larger  p,  the 

better.   If  p  is  large,  the  system  (v  , .  . .  ,  v  )  will,  in  practice, 

1       m 

be  close  to  orthogonality  and  the  solution  of  the  least  square 


problem  (3.13)  in  step  correct  becomes  easier  [at  the  limit  if  p  =  m 

H 
then  the  solution  is  just  h  , .,    V  v  .,  =  01.   But  in  that  case  the 

m+l,m  m  m+1 

computations  in  the  step  iterate  are  more  expensive.   If  p  is  too 
small,  on  the  other  hand,  it  is  very  likely  that  the  problem  (3.13) 
will  become  difficult  to  solve  (if  not  impossible  numerically)  as 
the  vectors  (v_ , . . . ,  v  )  will  become  nearly  linearly  dependent.   Note 
that  this  depends  also  upon  m.   When  m  =  p  the  system  is  orthonormal 
and  as  m  increases  it  is  observed  that  the  system  departs  from 
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orthogonality ,  in  a  slow  manner  at  the  beginning.   All  these 

observations  suggest  that  p  must  first  be  chosen  according  to  the 

main  memory  capacity  and  some  arbitrary  limitation  p  _<  p 

Afterwards,  a  maximum  number  of  steps  m    should  be  fixed.   Then 

max 

a  test  must  be  included  at  the  end  of  the  step  iterate  in  order  to 

shift  to  the  correction  step  as  soon  as  the  system  {v.,  v_  , .  .  .  ,  v.,..} 
12       j+1 

is  suspected  to  be  too  far  from  orthonormal,  as  for  example 
if  |  (v.   ,  v  )  |  >  n.  goto  correct 

where  n.  is  a  certain  tolerance.   The  heuristic  criterion  given  above 

is  not  the  best. 

4.   When  the  matrix  A  is  symmetric,  then  by  taking  p  =  2  we  obtain  a 

version  of  the  conjugate  gradient  method  which  is  known  to  be 

equivalent  to  the  Lanczos  algorithm  (see  [14]).   In  that  case  the 

vectors  v.,...,  v  are  theoretically  orthogonal.   Suppose  now  that 
I       m 

A  is  nearly  symmetric  and  take  p  =  2  again.   By  a  continuity  argument 

it  is  clear  that  the  system  (v.,...,  v  )  will  be  nearly  orthonormal, 

1       m 

making  the  choice  p  =  2  optimal  in  a  certain  sense.   This  suggests 
that  when  it  is  known  that  A  is  close  to  a  symmetric  matrix,  p  could 
be  taken  small  (or  even  p  =  2).   However,  it  is  not  easy  to  give  a 
rigorous  meaning  to  the  notion  of  nearly  symmetric  and  it  is  even 
more  difficult  to  monitor  automatically  the  choice  of  the  parameter  p. 
3.3.2.     In  the  following  we  develop  another  algorithm  which  is,  in 
particular,  more  appropriate  for  the  cases  of  almost  symmetric  matrices. 
As  pointed  out  above,  the  correction  step  can  be  expensive  and  one  may 
ask  whether  an  acceptable  accuracy  could  be  achieved  by  ignoring  the 
corrective  step  and  replacing  the  approximate  solution  x    =  x«  +  V  y  by 
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x(m)  =  xA  +  V  y  (3.14) 

0    mm 

The  answer  is  yes,  provided  that  V  , .  is  not  too  far  from 

m+l 

~  H 

orthonormal.   In  effect,  writing  H  =  H  -  s   e  ,  we  can  derive  the 

m    m    mm 

following  analogue  of  (3.12) 

,  H    /\ 

h~j_i      e    y        i 

~  ^      .      m+l ,  m     m     m     £—  1  /  n    ,  c  \ 

y     =  y     + hi ; H       s  (3.15) 

m  m        ..  H   *-l  m        m 

1   -   e      H        s 
mm       m 

u 

It   is  remarkable   that,   by    (3.6),    the   term  h    ,,        e     y     is   equal   to   the 

m+l,m  m  m 

residual  norm  ||r  -  Az    ||  except  for  the  sign,  and  hence  it  becomes 

smaller  as  m  increases.   If  {v.,...,  v  , .,  }  is  nearly  orthonormal 

1       m+l 
ii 

then  V  v    is  nearly  zero  and  so  will  be  s   in  general.   This  shows  that 
m  m+I  m 

in  general  the  second  term  on  the  right  hand  side  of  (3.15)  can  be 

neglected  (in  comparison  with  y  )  as  long  as  V    remains  nearly  orthonormal, 

m  m+l 

This  fact  is  confirmed  by  the  experiments  and  it  is  observed  that  the 

residual  norms  behave  in  the  same  manner  as  the  residual  norms  obtained 

for  the  incomplete  orthogonalization  method  applied  to  the  eigenvalue 

problem  (see  [18],  section  4.2). 

The  residual  norms  II r_  -  Ax  |  decrease  rapidly  until  a  certain 

U     m 

step  and  then  start  oscillating  and  decreasing  more  slowly.   This 
suggests  restarting  immediately  after  a  residual  norm  is  larger  than  the 
previous  one.   Here  again  the  formula  (3.6)  remains  very  useful  for 
estimating  the  residual  norm.   This  leads  to  the  following  algorithm. 
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Algorithm  3.5.   Incomplete  Orthogonalization  without  Correction 
Start,   x  :=  xQ;  r  :=  b  -  AxQ;  g  :=  ||r||;  v±    :=  r/£; 

Iterate.   For  i  =  1,  2, ....  m    do 

, max 

1.  compute 

J 

j+1,j   j+1     j         ij   x 

0 

where  i~  and  the  h. ,'s  are  as  in  Algorithm  3.4. 
0  ij 

2.  Update  the  factorization  of  H.  and  the  estimate  p.  of  the 

J  J 

residual  norm  (see  §3.1). 

3.  Test  for  convergence  performed  every  q  steps  only  (e.g.,  every  q  =  5 
steps) . 

a.  If  p.  <  e  goto  restart. 

b.  If  P.  >  p.    goto  restart:  otherwise  take  m  :=  i  and  continue. 

3        j-q 


Restart: 

z-(m>  :=  BV  H"1e1 
mm  1 


x  :=  x 


+  s<-> 


A~(m) 
r  :=  r  -  Az 


8  :=  \\r 


vx  :=  ?/e 


If  3  _£  e  stop  else  goto  iterate 


The  numerical  experiments  (§5)  will  reveal  that  this  last  algorithm  is  to 
be  preferred  to  the  iterative  Arnold!  Algorithm  and  to  the  incomplete 
orthogonalization  method  with  correction.   Surprisingly,  it  is  often  the 
case  that  no  restart  is  necessary,  even  for  matrices  that  are  not  nearly 
symmetric. 
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We  shall  conclude  this  section  by  a  remark  concerning  the 
application  of  preconditioning  techniques  to  the  algorithms  described 
above.   Suppose  that  we  can  find  a  matrix  M  for  which  linear  systems  are 
easily  solvable  and  such  that  M  A  is  closer  to  the  identity  than  A.   In 
this  case  it  is  advantageous,  in  general,  to  replace  the  system  Ax  =  b 
by  the  new  system  M  Ax  =  M  b  before  applying  one  of  the  previous 
methods.   There  are  two  reasons  for  this.   The  first  is  that  the  rate  of 
convergence  of  the  second  system  will,  in  general,  be  higher  than  that 
of  the  first  because  the  spectrum  will  be  included  in  a  disk  with  center 
one  and  with  small  radius,  and  the  next  section  will  show  that  in  that 
case  the  smaller  the  radius,  the  higher  the  rate  of  convergence.   The 
second  is  that  M  A,  which  is  close  to  the  identity  matrix,  is  clearly 
close  to  a  symmetric  matrix  (the  Identity)  so  that  the  application  of 
incomplete  orthogonalization  without  correction  is  most  effective  (cf  §5.5) 


4.   RATES  OF  CONVERGENCE  FOR  THE  KRYLOV  SUBSPACE  METHODS 
4.1.   Introduction 

We  shall  now  consider  the  problem  of  the  convergence  of  the 
approximate  x    toward  the  exact  solution  x*.   We  first  point  out  that 
the  convergence  is  achieved  in  at  most  N  steps  where  N  is  the  dimension 

of  A.   (This  is  immediate  from  the  fact  that  K^  is  the  whole  subspace 

j.N 

(f  and  from  the  definition  2.2.)   Therefore,  the  problem  is  not  to  show 

the  convergence  but  rather  to  establish  theoretical  error  bounds  showing 

that  one  can  obtain  a  satisfactory  accuracy  for  values  of  m  much  less 

than  the  dimension  N,  which  is  supposed  to  be  very  large.   Another  way 

of  stating  the  problem  is  to  suppose  that  A  is  an  operator  on  a  Hilbert 

space  (N  =  °°)  such  that  the  convergence,  the  rate  of  convergence...,  of  the 
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Inf inite  sequence  x    can  be  discussed.   We  shall  not,  however,  adopt 
this  extension  in  the  present  paper. 

In  view  of  relation  (2.13)  it  is  equivalent  to  study  either  the 

r       (m)      j.       ,  <-       (m)       . 

convergence  of  x    to  x*  or   the  convergence  of  z    to  Z'c.   In 

addition,  Corollary  2.1  shows  that  the  convergence  can  be  studied  in 

terms  of  ||  (I  -  tt  )z*||  where  tt   is  the  orthogonal  projection  onto  the 

m 1 

Krylov  subspace  K  =  Span  [vn,    Ar . , . . . ,  A   "  r_] .   Let  us  denote  by  P, 
m  U    U  (J  k 

the  space  of  polynomials  of  degree  not  exceeding  k.   Then,  a  useful 

expression  for  the  distance  II  (I  -  it  )z*||  can  be  derived  by  remarking 

M      m 

,  n 

that  K  is  nothing  but  the  subspace  of  v   constituted  by  all  the  elements 

q(A)rA  where  q  belongs  to  P   n . 
U  m-1 


Proposition  4.1.   The  distance   (I  -  tt  )z*  between  z*  and  the  Krylov 

II  m      II  _ . £ 

subspace  K  satisfies 

m  

||  (I  -  tt  )z*||  =  min   ||p(A)z*||  (4.1) 

f^m 
\p(0)=l 

Proof .   The  following  equalities  are  easy  to  show 

||  (I  -  tt  )z*||  =  min  ||z*  -  z  ||  =  min   ||z*  -  q(A)r  || 

zGK  qeP  ,  u 

m  n   m-1 

=   min   ||  z*  -  q(A)Az',{||  =   min   ||  (I  -  Aq(A))z*|| 

qeP  qGP 

m-1  n   m-1 

=  min   ||p(A)z*||  □ 
fpePm 

\P(0)-1 

In  order  to  obtain  an  upperbound  for  (4.1)  we  shall  assume  that 
A  admits  N  eigenvectors  <t>    ,  <()_,...,  (})  of  norm  one,  associated  with  the 
eigenvalues  X    ,...,  A  .   Then  the  solution  z*  can  then  be  expressed  as 
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N 


z*  =  Z  a,d>. 
1-1  X  X 


and  we  can  formulate  the  next  theorem. 


Theorem  4.1 


N 


Set  a  =   Z   a.  ,  where  the  a.  are  the  components  of  the 
i=l   *•' 1 

solution  z*  in   the  elgenbasis  of  A. 
Then 


m 


(I  -  7T  )z*||  <  a       mln   max  |p(X.) | 

J  ^  m  j=l,...,  N 
\p(0)=l 


(4.2) 


Proof.   Let  p  G  P  ,  with  p(0)  =  1.   Then 
m 


N 


N 


lp(A)z*||-||p(A)      Z     a.^.||  =  ||Z     p(A   )a  cfrjl 
1=1  j=l 

N  N 

<     Z     ||ot     P(X  )(J)    ||<     Z      |ot    I    |p(X  )| 

i=l  i=l 


N 

Z     la. I 
1=1       x 


max  |p(X . ) | 


j=l,...,    N 

Therefore,  for  any  polynomial  of  degree  not  exceeding  m  such  that  p(0)  =  1 
we  have 


||p(A)z*||  <_  a     max    |p(X.)| 
j=l,...,  N     3 

Hence   min   ||p(A)z*  ||  _<  a   min   max  |p(X.)|  which  by  equality  (4.1) 

f  PGP  f   P^    •  i       w 

)  m  )  m     j=l, . .  .  ,  N 

\p(o)=i  Vp(0)=l  * 

completes  the  proof.  □ 

We  point  out  here  that  from  classical  results  it  can  be  shown 
that  the  polynomial  realizing  the  minimum  in  (4.2)  exists  and  is  unique 
provided  that  m  _<  N  (see  [11]).   We  should  also  add  that  there  is 
unfortunately  no  upper  bound  for  a. 


(4.3) 
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We  shall  set  throughout 

Am) 


min       max     |p(A.)|  (A.  4) 

'  pePm  j-i,...f  N    J 
P(0)=1 


so  that  inequality  (4.2)  simplifies  to 


II  (I  -  77  )z*||  <  a  e(m)  (4.5) 

m 

and  the  result  (2.7)  becomes 

ii     u  (m)  n       ii     ,.  (m)  ii   _,       /  2      2      (m) 

x*  -  xv    '      =     z*  -   z  <  a/1  +  Y     K     E 

n  ii       ii  ii  _  'mm 

We,  therefore,  need  to  show  that  the  sequence  £    decreases 

(N) 
rapidly  to  zero.   Note  that  £    =0  which  shows  again  that  the  process 

will  give  the  exact  solution  in  at  most  N  steps.   The  rate  of  convergence 

of  the  sequence  £    to  zero  provides  a  bound  for  the  actual  rate  of 

convergence.   Estimating  £    is,  unfortunately,  a  difficult  problem 

in  general.   The  number  £    is  the  degree  of  best  approximation  of  the 

zero  function  by  polynomial  of  degree  m  satisfying  the  constraint 

p(0)  =  1,  over  the  set  A  ,  A  ,...,  A   (see  [11]). 


i    r.  n  (m) 

4.2.   An  exact  expression  for  £ 

_.   r  , .   ,     ,        .  ,    ,.     (m) 

The  following  theorem  gives  an  expression  for  £    in  terms 

of  m  +  1  eigenvalues  of  A. 


Theorem  4.2 

Let  m  £  N-l.   Then  there  exist   m+1  eigenvalues  which,  with- 
out ambiguity   can  be  labelled  A.,  An , . . . ,  A  ,,  such  that 

i    I  m+i 


(m) 
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m+1  m+1 

I  77 

j=l  k=l 

L  #3 


Xk 


-1 


Aj  -  Xk 


(4.6) 


We  omit  the  proof  of  this  equality.   An  analogue  result  will 

be  proved  in. a  forthcoming  paper  dealing  with  the  convergence  of  Arnoldi-like 

methods  for  computing  eigenelements. 

The  result  does  not  specify  which  are  the  eigenvalues  X  , .  .  .  ,  X 

1       m+1 

but  it  still  gives  an  interesting  indication.   If  the  origin  is  well 
separated  from  the  spectrum  then  £    is  likely  to  be  very  small.   Indeed 
if  X   is,  for  example,  the  eigenvalue  the  closest  to  zero,  among  those 
eigenvalues  involved  in  the  theorem,  then,  in  general,  we  shall  have 
I A  |  >  |X  -  X  |,  k  =  1,...,  N  as  seen  in  Figure  1.   Therefore, 

K.         X      K. 


m+l 

TT 

k=2 


Xk 


Xk  -  XI 


»  1 


m     a 


>Re(X) 


Figure  1. 
and  it  is  seen  from  (4.6)  that  £    will  be  small.   There  are  particular 
distributions  of  the  eigenvalues  where  £    is  known  exactly  (for 
m  =  N-l) .   But,  in  general,  the  result  (2.14)  is  not  useful  for  giving 
an  estimation  of  the  rate  of  convergence.   Upperbounds  for  £    must  be 
established  for  that  purpose. 


4.3.   Bounds  for  e 
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(m) 


In  the  real  case  one  usually  obtains  bounds  for  £    by  majorizing 

the  discrete  norm  max   |p(A.)|  by  the  continuous  norm  max  |p(A)|  where  I 
j=l,N     2  z£l 

is  an  interval  (or  the  union  of  two  intervals)  containing  the  eigenvalues 

A.  and  not  zero. 
J 

In  the  complex  case,  however,  one  encounters  the  difficulty  of 
choosing  an  adequate  continuum  containing  all  the  eigenvalues  and  not 
zero.   An  infinity  of  choices  are  possible  but  except  for  some  particular 
shapes  such  as  circles,  ellipses...,  there  is  no  simple  expression  for  the 

minimax  quantity   min   max  |p(z)|  . 

p€P    z€D 
m 


{. 


p(0)=l 

We  first  deal  with  the  simplest  case  where  all  the  eigenvalues 
of  A  are  real  and  positive.   The  next  case  to  consider  is  naturally  the 
case  where  the  eigenvalues  are  almost  real.   The  general  case  will  be 
considered  in  subsections  4.3.3  and  4.3.4. 

4.3.1.   Case  of  a  purely  real  spectrum 

Theorem  4.3 

Suppose  that  all  the  eigenvalues  of  A  are  real  and  positive  and 

let  A  .   and  A    be  the  smallest  and  the  largest  of  them. 
min  max a 

Then 

II  CI  -  tt  )z*||  <  a/T  (y)  (4.7) 

m     —    m 

where  a  is  as  before,  Y=(A    +A.)/(A    -A.)  and  where  T   is  the 

max    min    max    min m 

Tchebychef f  polynomial  of  degree  m  of_  the  first  kind. 

This  result  is  an  immediate  application  of  a  well-known  bound 
for  (U.U)   when  the  A  are  real  [2].   It  Is  also  possible  to  establish  some 
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results  when  the  eigenvalues  are  known  to  lie  in  two  or  more  intervals 

(see  [2],  [10]). 

Inequality  (2.11)  shows  that  the  Generalized  Lanczos  method 

converges  at  least  as  rapidly  as  [T  (y)]  "  -  (y  + /y  -  1)    such  that 

m 

n — 

the  rate  of  convergence  is  bounded  by  y   +/y  -  1. 

Finally   note  that  similar  results  can  easily  be  obtained  if 
all  the  eigenvalues  are  purely  imaginary  or  if  they  lie  on  a  straight 
line  of  C,  containing  the  origin. 

4.3.2.   Almost  purely  real  spectra 

In  the  following  we  shall  assume  that  the  spectrum  lies  inside 

a  certain  ellipse  which  has  center  c  on  the  real  line  and  foci  c  +  e, 

c  -  e  where  e  is  the  eccentricity.   Furthermore  we  shall  assume  that  the 

origin  is  not  inside  that  ellipse  (see  Figure  2). 

Alm(z) 

,< —  e > 


i-e—  a — >; 
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Fi         C  F2\     Re(z) 

1- 1 — + > 
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Figure  2. 
Let  us  denote  by  E  the  closed  domain  bounded  by  the  ellipse  defined  above. 
Consider  the  variable  transform  z1  =  (c  -  z)/e;then  £    satisfies  the 
inequality 
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e(m)  <   min     max   |p(z')|  (4.10) 

f    pep    z»eE' 
J     m 

\p(c/e)=l 
where  the  domain  E1  is  bounded  by  the  ellipse  centered  at  origin  with 
eccentricity  one  and  major  semi-axis  a/e.   It  was  shown  by  Clayton  [4] 

that  the  above  mini-max  is  realized  for  the  polynomial  T  (z')/T  (c/e) . 

m      m 

Theorem  4.4 

Assume  that  the  eigenvalues  of  A  lie  within  an  ellipse  with 

center  c  on  the  real  axis,  foci  c  +  e,  c  -  e,  and  with  major  semi-axis 

a.   Suppose  that  the  origin  is  not  inside  this  ellipse.   Then 

,   v     T  (a/e) 

1  m      ' 

In  view  of  (4.10)  this  inequality  is  a  simple  corollary  of 

Clayton's  result.   Since  Che  proof  is  tedious,  we  shall  give  a  direct 

proof  of  (4.11)  and  bypass  Clayton's  result. 

Proof.   Considering  the  particular  polynomial  T  (z')/T  (c/e)  we  get 
m      m 

from  (4.10) 

Tjz') 

(4.12) 


(m)  „ 
£    <  max 


T  (c/e) 
m 


z'GE' 

By  the  maximum  principle,  the  maximum  on  the  right  hand  side  is  realized 
for  z'  belonging  to  the  boundary  3E'  of  the  ellipse  E'  centered  at  the 
origin  and  having  major  semi-axis  a/e  and  eccentricity  one.   Thus  (4.2) 
becomes 

c(m>  <  It  <\l^\    '      max   |T  (z')|  (4-13) 

—  T  (c/e)      ,,.,„,  '  m    ' 
1  m     '    z  G3E' 

1      i 
Consider  now  the  transform  u:   w  ■*-*■  z     =  -^   (w  +  — )  .   It  is  known 

z.  w 

[11),  [17]  that  when  w  belongs  to  the  circle  C  centered  at  the  origin 
and  having  radius  p,  z'  will  belong  to  the  ellipse  9E.  having  eccentricity 
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— 1  '     2 

one  and  major  semi-axis  (p  +  p   )/2.   We  may  take  p  =  a/e  +  /(a/e)   -1 

such  that  3Ep  is  just  9E'.   T  (z)  can  be  defined  by  T  (z)  =  ch(m.u  )  where 

m  m 

u  and  z  are  related  by  ch(u)    =   z.      Setting  e     =  w  we   see   that   another 

definition  for  T    (z)    is  T    (z)   =    (w     +  w     )/2  where  w  and   z   are   related 
m  m 

by(w+w     )/2  =   z.      Hence 

|T    /    tv I  1    i    m  -mi 

max  T    ( z    )      =  max  -r-     w     +  w 

z'e9E'       m  weep 

1    |    m  im6  -m  -im6 . 

max       y   | p   e  +  p     e 

ee[0,2ir] 
It   is   easily  seen   that   the  above  maximum  is  just 


\    (Pm  +  p-m)    =  J    [<f  +   /(f)2    -    l)m  +    £  -    /(f)2    -    I)""1] 
z  zee  e  e 

=   T    £) 
m  e 

which  completes  the  proof.  D 

The  upperbound  T  (a/e)/T  (|c/a|)  for  e  is  asymptotically 

m       m 

equivalent  to 

—1  m 


a/e  +/(a/e)2  -  1 
|c/e|  +/(c/e)2  -  1 

so  that  an  upperbound  for  the  asymptotic  rate  of  convergence  is  given  by 


I     I        /2  2 

c+/c-e  //-,/\ 

t  =  -1— ' 2  (4.14) 

a     +  /a     -   e 
When  the  eigenvalues   are   all   real,    then   the  ellipse  degenerates 
to   the  interval    [X    ,    X   ]    and  we  shall  have  e  =   a  =    (A     -   A   )/2, 


A2 


c  =  (A  +  A  )/2  such  that  t  will  become  y  +  / y     -  1  with 

y  =  (A  +  ^-i)/(^M  -  A  ).   This  means  that  the  result  (2.17)  coincides 

with  that  of  Corollary  2.1  when  the  spectrum  lies  on  the  real  line. 

Consider  now  the  family  of  all  ellipses  having  center  c  and 
major  semi-axis  a  and  let  the  eccentricity  decrease  from  a  to  zero.   Then 
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the  ellipse  will  pass  from  the  interval  (c  -  a,  c  +  a)  to  the  circle 
with  center  c  and  radius  a.   It  is  easily  seen  that  the  bound  (4.14) 


for  the  rate  of  convergence  will  decrease  from  T    =  I  c/a  I  +/(c/a)   -  1 


max 


\*A 


to  T  .   =  c/a  .   Therefore,  we  may  assert  that  the  convergence  is  likely 
min    ■    ■  J 

to  be  better  if  the  eigenvalues  are  close  to  the  real  line  and  that 
when  the  spectrum  has  a  circular  shape  the  convergence  is  likely  to 
be  slower.   Note  that  the  comparison  is  made  for  the  same  relative 
separation  |c/a|  from  the  origin.   The  above  comments  are  confirmed 
by  a  numerical  example  in  section  5.1. 

Before  considering  the  more  general  case  where  the  ellipse 
containing  the  spectrum  does  not  stretch  along  the  real  axis,  let  us  point 
out  that  inequality  (4.11)  cannot  be  improved  as  Clayton's  result  shows. 
By  this  we  mean  that  if  one  replaces  the  discrete  set  {A  , ...,  A  }  by 
the  set  of  all  points  contained  in  an  ellipse  of  the  form  described  in 
Figure  2,  one  cannot  find  a  better  inequality  than  (4.11). 

4.2.3.   Spectrum  contained  in  an  ellipse 

If  the  spectrum  lies  inside  an  ellipse  with  center  c  and  foci 
c  +  e,  c  -  e  where  now  both  c  and  e  are  complex,  it  is  easily  seen  that 
the  proof  of  Theorem  4.4  is  still  valid.   Therefore,  we  can  establish  that 


,    v    |T  (a/e) 
(m)  <   m 


IT  (c/e)|  (4-15> 

m      ' 
Where  c,  e  are  the  center  and  the  "eccentricity"  and  are  complex,  while 
a  the  (complex)  major  semi-axis  is  such  that  c  +  a  and  c  -  a  are  the 
coordinates  of  the  two  points  of  the  ellipse  situated  on  the  major 
semi-axis.   Note  that  a/e  is  real  while  c/e  is  not.   The  interpretation 
of  (4.15)  will,  therefore,  not  be  easy  in  general.   It  can  be  shown, 
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however,  that  the  right  hand  side  of  (4.15)  converges  to  zero  as  m  ■*  °° 
(see  [12]).   The  next  subsection  gives  a  result  which  is  weaker,  in 
general,  but  easier  to  interpret. 

4.3.4.   Spectrum  contained  in  a  circle 

In  this  subsection  we  shall  assume  that  the  spectrum  lies  in 
a  certain  domain  bounded  by  a  circle  having  center  c  and  radius  a. 
Furthermore,  let  us  assume  that  the  origin  lies  outside  the  circle 
(cf.  Figure  3). 


Re(z) > 


Figure  3. 
Then  we  have 
Theorem  4.4 

Suppose  that  there  exists  a.  disk  D(c,a)  with  center  c  and 
radius  a,  that  contains  all  the  eigenvalues  of  A  and  not  the  origin. 
Then 

Am) 


<  m 


aim 

c 


(4.16) 


—       c— z  m 
Proof.   Consider  the  particular  polynomial  p(z)  =  [ ]  .   p  has  degree 

m  and  satisfies  p(0)  =  1.   Hence,  by  (2.13) 
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£    <     max     p(A.)   <  J   <  —    D 

—  't1  —      C  —   C 

j=l,...,  N     J 
The  coefficient  |a/c|  in  (2.21)  is  smaller  than  one  and  one 
can  even  choose  an  "optimal"  circle  for  which  |a/c|  is  the  least.   The 
optimal  center  c  should  minimize     max    | (c  -  Aj)/c|  over  all  complex 

c,  c  ^  0  and  the  optimal  radius  a  is  simply     max    |c  -  X j | .   The 

j=l,... ,  N 

inequality  (2.21)  is  the  best  bound  possible  for  e  that  can  be  obtained 

by  replacing  the  discrete  set  {A  , . .  .  ,  A  }  by  the  disk  D(c,a)  in  the 
formula  (2.13).   This  is  due  to  the  next  theorem,  proved  by  Zorantonello 
in  [22] . 


Theorem  2. 3 

The  polynomial  ((c  -  z)/c)   is  the  polynomial  of  degree  m 

having  least  uniform  norm  over  the  disk  D(c,a)  when  a  <  | c | .   Furthermore 

i  aim 


{, 


mm     max 

peP   zeD(c,a) 

m 


p(0)=l 


5.   NUMERICAL  EXPERIMENTS 

The  experiments  described  in  subsections  5.1  to  5.4  have  been  performed 
on  the  Prime  650  computer  of  the  Department  of  Computer  Science  at  the 
University  of  Illinois  at  Urbana-Champaign.   The  computations  have  been 
made  in  double  precision,  using  a  48-digit  mantissa. 

5.1. 

The  purpose  of  this  first  experiment  is  to  illustrate  the 
comments  of  section  4.3.2  on  the  convergence  properties  in  the  case  of 


d, 

e. 

k 

k 

D,    = 

k 

— e. 

d. 

k 

k 
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complex  eigenvalues.   Let  us  consider  the  block  diagonal  matrix  A  whose 
diagonal  blocks  are  2  x  2  and  have  the  form 


k  +  1,  2, . . . ,  n 


The  d,  and  e  are  chosen  in  such  a  way  that  the  eigenvalues 
A  =  d  +  ie  of  A  lie  on  the  ellipse  having  center  c  =  1  and  major 
semi-axis  a  =  0.8.   The  eccentricity  e  varies  from  e  =  0  to  e  =  0.8. 
The  real  parts  d  of  the  eigenvalues  are  uniformly  distributed  on  the 
interval  [c  -  a,  c  +  a].   In  other  words 

a         no    i  k  "  1  •      (J          V/2  n    (dk  "  C)  ,1/2 
dk  "  °'2  +  n~^T  5  \  -  (a  -  e  )    [1 ~2 ] 

a 

k  =  1,  2,. . . ,  n 
where  c  =  1;  a  =  0.8;  0  <_  e  j£  0.8.   The  number  of  blocks  is  n  =  40  so 
that  A  has  dimension  N  =  80. 

We  compare  for  different  values  of  e  the  estimated  logarithmic 
rates  of  convergence  p    =  Log (t),  where  T  is  given  by  (4.14),  with  the 
"actual"  logarithmic  rates  -  —  Log(||x*  -  x    ||)  where  x*  and  x    are  the 
exact  and  the  approximate  solutions,  respectively.   The  method  used  was 

Arnoldi's  algorithm  described  in  section  3.1.   The  right  hand  side  b  of 

T 
the  system  Ax  =  b  was  the  vector  b  =  Af  where  f  =  (1,  1, . . .  ,  1)   so 

the  solution  is  equal  to  f.       The  starting  vector  xQ  was  set  to  zero. 

The  next  table  gives  the  results  obtained  when  m  =  30  for  various  values 

of  e. 
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TABLE  1. 

e 

||x*- 

■x(m)|| 

P     «_ 

act 

p 
est 

0.00 

2.68 

x  10"3 

0.199 

0.223 

0.10 

2.38 

x  10"3 

0.201 

0.224 

0.20 

2.11 

x  10~3 

0.205 

0.228 

0.30 

1.69 

x  10~3 

0.212 

0.237 

0.40 

1.18 

x  10"3 

0.225 

0.250 

0.50 

6.71 

x  10"4 

0.243 

0.270 

0.60 

2.62 

x  10~4 

0.275 

0.303 

0.70 

4.22 

x  10~5 

0.335 

0.367 

0.75 

6.40 

x  10"6 

0.398 

0.432 

0.79 

1.62 

x  10"7 

0.521 

0.555 

0.80 

1.55 

x  lO"10 

0.753 

0.693 

Note  that  in  passing  from  e  =  0.79  to  e  =  0.80  the  spectrum  of  the  matrix 
A  becomes  purely  real  and  consists  in  40  double  eigenvalues,  which  explains 
the  jump  in  the  actual  rate  of  convergence. 


The  values  p    and  p    of  Table  1  are  plotted  in  the  next 
act      est 


figure. 
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Figure  4 


We  shall  compare  in  the  following  experiment  the  method  of 


H       H 
conjugate  gradients  applied  to  the  problem  A  Ax  =  A  b  with  the  iterative 


Arnold!  algorithm.    Consider  the  block-tridiagonal  matrices 
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A  = 


B   -I 

\   \ 
-I  \   \ 

\  \   \ 
\  \   \ 

\   ^   N 
\   \  -I 
\   v 
-I   B 


with   B  = 


4  a 

b  \   \ 
\  \ 
\  \ 
\  \ 
\  \ 

b  4 


\ 


and  a  =  -1  +  6 ;  b  =  -1  -  6. 

These  matrices  come  from  a  discretization  of  partial 
differential  equations  involving  a  non-selfadjoint  operator  (see  [12], 
[18]).   When  6  is  small  the  matrix  A  is  almost  symmetric.   The  conjugate 
gradient  algorithm  was  run  for  the  following  case:  6  =   0.01,  B 

has  dimension  10  and  A  has  dimension  200.   The  right  hand  side  b  was  set 

T 
to  Af  where  f  =  [1,...,  1]   and  the  initial  vector  was  chosen  randomly. 

We  have  compared  the  results  with  those  obtained  with  the  iterative 

Arnoldi  method  using  10  steps  per  iteration  (m  =  10)  and  20  steps 

per  iteration.   The  initial  vector  as  well  as  the  right  hand  side  are 

the  same  as  above.   Figure  5  shows  in  a  logarithmic  scale  the  evolution 

of  the  error  norms  obtained  for  the  same  total  number  of  steps. 

Notice  that  although  the  total  number  of  steps  required  to  achieve 

convergence  is  smaller  with  Arnoldi' s  method,  the  total  amount  of  work 

required  in  this  example  is  in  favor  of  the  conjugate  gradient  method 

because  the  cost  of  computing  Av  is  not  high.   The  method  of  Arnoldi  will 

be  appropriate  whenever  the  cost  of  computing  Av  dominates  all  the  other 

costs  in  each  step  but  this  will  not  always  be  the  case.   Figure  5  also 

shows  that  when  the  matrix  by  vector  multiplication  is  costly,  it  may  be 

advantageous  to  choose  m  as  large  as  possible. 
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Figure  5.   Conjugate  gradients  for  A  Ax  =  A  b  (upper  curve)  and 
iterative  Arnoldi  method.   p  =  10  middle  curve, 
p  =  20  lower  curve. 
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5.3.  In  the  previous  example,  the  matrix  treated  is  nearly  symmetric 
and  so    the  use  of  the  incomplete  orthogonalization  method  without 
correction  is  more  suitable.   Taking  p  =  2,  and  starting  with  the  same 
initial  vector  as  in  the  experiment  of  5.2,  yielded  a  rapidly  decreasing 
sequence  of  residual  norm  estimates.   No  restart  was  necessary 

and  convergence  occurred  after  90  steps  with  a  residual  norm  equal  to 
4.6  x  10   .   Clearly  the  amount  of  work  required  here  is  far  less  than 
that  required  by  either  of  the  methods  compared  in  5.2. 

5.4.  We  shall  now  compare  the  incomplete  orthogonalization  methods  with 
and  without  corrective  step  on  the  100  x  100  block  tridiagonal  matrix  A 
of  §5.2  obtained  by  taking  6  =  0.2.   In  a  first  test  an  iterative  method 

based  upon  the  incomplete  orthogonalization  algorithm  with  correction 

H 
(Algorithm  3.4)  was  tried.   As  soon  as  the  estimate  ^h  M     e  y   of 

m+l , m   m  m 

the  residual  norm  stops  decreasing  or  when  the  number  of  steps  reaches 

the  maximum  number  of  steps  allowed,  m    =40,  the  algorithm  is  halted, 

max 

a  corrective  step  is  taken  and  the  algorithm  is  either  stopped  (if  the 
residual  norm  is  small  enough)  or  restarted.   For  the  present  example  the 

algorithm  halted  first  at  m  =  20  and  gave  a  residual  norm  of  1.8.   After 

-3 
the  correction  step,  the  residual  norm  dropped  down  to  6.2><10   .   In  the 

second  iteration  the  algorithm  halted  at  m  =  m    =40  and  gave   the 

max 

residual  norms  9.6x10  "  before  the  correction  and  1.14x10   after. 
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It  is  important  to  mention  that,  here,  the  corrective  steps 

necessitate  the  use  of  the  bidiagonalization  algorithm  to  compute  the 

corrective  column  s  ,  which  is  usually  very  expensive. 

m 

The  results  obtained  with  the  incomplete  orthogonalization 
method  without  correction  are  by  far  superior  from  the  point  of  view 

of  the  run  times.   Algorithm  3.5  was  first  tested  with  p  =  2. 

st 
At  the  1   iteration  the  residual  norms      decreased  from  7.6  to 

1.8  at  the  15th  step  and  then  a  restart  was  made.   At  the  2 

— fi 
iteration  the  residual  norms  kept  decreasing  rapidly  to  2.1  x  10 

at  the  60th  step.   The  test  with  p  =  A  yielded  a  steadily  decreasing 

sequence  of  residual  norm  estimates  and  therefore  no  restart  has  been 

necessary.   The  final  residual  norm  obtained  at  m  =  60  was  7.88  x  10 


5.5.   Finally  we  shall  describe  an  experiment  on  a  more  difficult  example 
considered  in  [19].   The  runs  reported  below  have  been  made  on  a  CDC 
CYBER  175  computer  using  a  word  of  60  bits  and  a  mantissa  of  48  bits 
(single  precision).   The  problem  Ax  =  b  treated  has  dimension  N  =  1000 
and  the  nonzero  part  of  A  consists  in  7  diagonals. 


A  = 


St  St 

(The  nonzero  elements  of  the  1  '  row  and  1    column  of  A  are  A   ,  A   , 

Ai,io'  Ai,ioo'  V  Aio,r  Aioo,r}    The  problem  ori8inated  f™  the 

simulation  of  a  reservoir,  and  is  known  to  be  badly  conditioned.   It  has 
been  solved  in  [18]  by  using  Chebychev  iteration  combined  with  a 


-35- 

preconditioning  technique.   The  matrix  A  was  first  decomposed  as  A  =  LU  +  F 
where  M  =  LU  is  an  approximate  LU  decomposition  of  A  provided  by  one 
step  of  the  SIP  algorithm  described  in  [21].   Then  Richardson  iteration 
was  run  for  the  problem  M  Ax  =  M  b,  yielding  the  sequence  of  approximate 
solutions 

x(k+i>  ,  x(k)  +  tfc  M-i  r(k)  (5  2) 

(k)  (k) 

where  r    is  the  residual  b  -  Ax    and  t.  is  an  acceleration  parameter. 

k 

The  acceleration  parameters  toere  first  chosen  a  priori  and  as  the 
iteration  proceeded,  they  were  periodically  adjusted  in  such  a  way  that 
the  iteration  (5.2)  matches  the  (optimal)  Chebyshev  iteration  [12] 
for  the  problem  M  A  =  M  b.   After  60  steps  the  residual  norm  has 
decreased  by  a  factor  of  (see  [19]): 

||r(60)|M|r(0)|h  2.025  xlO-5 
The  initial  vector  x^  was  generated  randomly.   Note  that  an  important 
part  of  the  calculations  lies  in  the  computation  of  a  few  eigenvalues 
of  A,  as  these  are  needed  for  determining  the  optimal  parameters  t  . 
Two  runs  have  been  made  with  Algorithm  3.5,  the  first  with 
p  =  2  and  the  second  with  p  =  4.   The  same  preconditioning  matrix  M  =  LU 
as  above  has  been  used.   Figure  6  shows  the  evolution  of  the  residual 
norms  ||M   Ax    -  M  b||  and  confirms  the  remarks  ending  section  3. 
In  either  case,  no  restart  was  necessary  and  at  the  60th  step, 
the  actual  residual  norms  ||b  -  Ax^  '  ||  decreased  by  a  factor  of 

l|r(60)||/||r(0)||  -  4.44  x  10"7  for  p  =  2 

and  l|r(60)||/||r(0)||  -  1.62  x  10"7  for  p  =  4 
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Clearly,  here  the  choice  p  =  2  is  more  suitable  than  p  =  4.   Note  that, 
with  p  =  2,  each  step  of  Algorithm  3.5  requires  about  21  N  operations 
while  each  step  of  the  first  method  requires  an  average  of  16.7  N 
operations  per  step  [19].   Considering  that  it  takes  40  steps  for  the 
second  method  to  get  the  residual  norm  reduced  by  a  factor  of 
||  r     ||/||r    ||  -  3.3  x  10   ,  it  is  easily  seen  that  the  total  number 
of  operations  is  about  16%  less  with  Algorithm  3.5.   Thus,  the  total 
numbers  of  operations  are  comparable.   The  first  method  requires, 
however,  5  N  more  memory  locations  than  the  second.   (These  are  used 
to  estimate  the  eigenvalues  of  M  A.)   Let  us  mention  that  on  another 
example  similar  to  the  present  one,  the  Chebyshev  iteration  failed  to 
converge,  while  the  I.O.M.  gave  the  solution  without  any  problem  with  p  =  2, 
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Figure  6.   Convergence  of  algorithm  3.5  on  example  of  §5.5- 
Upper  curve  p  =  2,  lower  p  =  4. 
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